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Abstract. The famous Godel incompleteness theorem states that for 
every consistent sufficiently rich formal theory T there exist true state- 
ments that are unprovable in T. Such statements would be natural can- 
didates for being added as axioms, but how can we obtain them? One 
classical (and well studied) approach is to add to some theory T an ax- 
iom that claims the consistency of T. In this paper we discuss another 
approach motivated by Chaitin's version of Godel's theorem where ax- 
ioms claiming the randomness (or incompressibility) of some strings are 
probabilistically added, and show that it is not really useful, in the sense 
that this does not help us to prove new interesting theorems. This re- 
sult (cf. |She06| ) answers a question recently asked by Lipton [LRllj . 
The situation changes if we take into account the size of the proofs: 
randomly chosen axioms may help making proofs much shorter (un- 
less NP=PSPACE). This result partially answers the question asked 
in |She06j . 

We then study the axiomatic power of the statements of type "the Kol- 
mogorov complexity of x exceeds n" (where x is some string, and n is 
some integer) in general. They are 111 (universally quantified) statements 
of Peano arithmetic. We show (Theorem|5| that by adding all true state- 
ments of this type, we obtain a theory that proves all true Ili-statements, 
and also provide a more detailed classification. In particular, as Theo- 
rem [7] shows, to derive all true Ili-statements it is enough to add one 
statement of this type for each n (or even for infinitely many n) if strings 
are chosen in a special way. On the other hand, one may add statements 
of this type for most x of length n (for every n) and still obtain a weak 
theory (Theorem 10 1. We also study other logical questions related to 
"random axioms" (hierarchy with respect to n, Theorem[8]in Section 3.3 
independence in Section 13.61 etc.). 



Finally, we consider a theory that claims Martin-Lof randomness of a 
given infinite binary sequence. This claim can be formalized in different 
ways. We show that different formalizations are closely related but not 
equivalent, and study their properties. 



1 Introduction 



We assume that the reader is familiar with the notion of Kolmogorov complexity 
and Martin-Lof randomness (See [LV08 SheOO DH10 for background information 



about Kolmogorov complexity and related topics) , but since for our purposes this 
notion needs to be expressed in formal arithmetic, we recall the basic definitions. 
The Kolmogorov complexity C(x) of a binary string x is defined as the mini- 
mal length of a program (without input) that outputs x and terminates. This 
definition depends on a programming language, and one should choose one that 
makes complexity minimal up to 0(1) additive term. Technically, there exist dif- 
ferent versions of Kolmogorov complexity. Prefix complexity K(x) assumes that 
programs are self-delimiting. We consider plain complexity C(x) where no such 
assumptions are made; any partial function D can be used as an "interpreter" 
of a programming language, so D(p) is considered as an output of program p, 
and Cd(i) is defined as the minimal length of p such that D{p) = x. Then some 
optimal D is fixed (such that Cd is minimal up to O(l) additive term), and 
Cd(x) is called (plain Kolmogorov) complexity of x and denoted C(x). Most 
strings of length n have complexity close to n. More precisely, the fraction of 
n-bit strings that have complexity less than n — c, is at most 2 _c . In particular, 
there exist strings of arbitrary high complexity. 

However, as G. Chaitin pointed out in |Cha71] . the situation changes if we 
look for strings of provably high complexity. More precisely, we are looking for 
strings x and numbers n such that the statement "C(x) > n" (properly formal- 
ized in arithmetic; note that this is a IT statement) is provable in formal (Peano) 
arithmetic PA. Chaitin noted that there is a constant c such that no statement 
"C(x) > n" is provable in PA for n > c. Chaitin's argument is a version of Berry's 
paradox: Assume that for every integer k we can find some string x such that 
"C(x) > k" is provable; let Xk be the first string with this property in the order 
of enumeration of all proofs; this definition provides a program of size O(logfc) 
that generates x&, which is impossible for large k since "C(xfc) > k" is provable 
in PA and therefore true (in the standard model) Q 

This leads to a natural idea. Toss a coin n times to obtain a string x of 
length n, and consider the statement "C(x) > n — 1000". This statement is true 
unless we are extremely unlucky. The probability of being unlucky is less than 
2~ 1000 . In natural sciences we are accustomed to identify this with impossibility. 
So we can add this statement and be (almost) sure that it is true; if n is large 
enough, we get a true non-provable statement and could use it as a new axiom. 
We can even repeat this procedure several times: if the number of iterations m 
is not astronomically large, 2 _1000 m is still astronomically small. 

Now the question: Can we obtain a richer theory in this way and get some 
interesting consequences, still being practically sure that they are true? The an- 
swers are given in Section [2j 

4 Another proof of the same result shows that Kolmogorov complexity is actually not 
very essential here. By a standard fixed-point argument one can construct a program 
p (without input) such that for every program q (without input) the assumption "g 
is equivalent to p" (i.e., q produces the same output as p if p terminates, and q 
does not terminate if p does not terminate) is consistent with PA. If p has length k, 
for every x we may assume without contradiction that p produces x, so one cannot 
prove that C(x) exceeds k. 
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— yes, this is a safe way of enriching PA (Theorem [TJ ; 

— yes, we can get a stronger theory this way (Chaitin's theorem), but 

— no, we cannot prove anything interesting this way (Theorem [3]) . 

So the answer to our question is negative; however, as we show in section [273 
(Theorem]!]) , these "random axioms" do give some advantages: while they cannot 
help us to prove new interesting statements, they can significantly shorten some 
proofs (unless PSPACE=NP). 

In Section [3] we switch to a more general question: what is the axiomatic 
power of statements "C(x) > n" for different x and n and how are they related 
to each other? They are Ili-statements; we show that adding all true state- 
ments of the form u C(x) > n" as axioms, we can prove all true Ili-statements 
(Theorem 

We show that the axiomatic power of these statements increases as n in- 
creases, and relate this increase to a classification of Ili-statements by their 
"complexity" (cf. CC09a CC09b ). We show that for some c and for all n one 
(true) statement C(x) > n for some string x of length n is enough to prove all 
true statements C(y) > n—c (Theorem[7]), and that all true statements C(x) > n 
for given n are not enough to prove any statement C(y) > n + c (Theorem [8]) . 
In other words, the bigger the lower bound for complexity is, the more powerful 
the axioms we get. 

This result is rather fragile. First, the choice of a string x is important: for 
other strings of length n this property is not true (even if we add many axioms at 



is crucial (Theorem 11) 



the same time, see Theorem 10 ). Also the precision of the complexity information 



Then we show that a random choice of several random axioms (about com- 
plexities) with high probability leads to independent statements (any combi- 
nation of these statements with or without negations is consistent with PA, 



Theorem 13 1. 

Many of the results mentioned are closely related to corresponding statements 
in computation theory. Theorem 1 1 5 1 shows , however, that such a correspondence 
does not necessarily works. 

The results mentioned above deal with unconditional complexity. Some of 
them trivially generalize to conditional complexity, but sometimes the situation 
changes. In Theorem [16] we show that there exists some c such that every true 
IT-statement can be derived in PA from true statements of the form C(x\y) > c. 
(Here C(x\y) is the length of the shortest program that maps y to x.) 

Finally, in Section [4] we switch from finite strings to infinite sequences and 
consider theories that say (in some exact way) that a given infinite binary se- 
quence X is Martin-L6f random. 



5 By "statements" we mean statements in the language of PA, and by "true state- 
ments" we mean statements that are true in the standard model of PA. 
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2 Probabilistic proofs in Peano arithmetic 



2.1 Random axioms: soundness 

Let us describe more precisely how we generate and use random axioms. As- 
sume that some initial "capital" e is fixed. Intuitively, e measures the maximal 
probability that we agree to consider as "negligible". 

The basic version: Let c be an integer such that 2~ c < e and n an integer. 
We choose at random (uniformly) a string x of length n, and add the statement 
u C(x) > n — c" to PA (so it can be used together with usual axioms of PA)Q 

A slightly extended version: We fix several numbers n\ 1 . . . , n% and ci, . . . , Ck 

such that 2 Cl + . . . + 2 Ch < e. Then we choose at random strings . . . , x^ of 
length m, . . . , rife, and add all the statements " C(;Ej) > n — c,-" for i = 1, . . . , k. 

Final version: In fact, we can allow an even more flexible procedure of adding 
random axioms that does not mention Kolmogorov complexity explicitly. Assume 
that we have already proved for some finite set A of natural numbers, for some 
rational 6 > and for some property R(x) (an arithmetical formula with one free 
variable x) that the proportion of members of A such that ->R(n) is at most 8. 
We denote the latter statement by (Vs% G A)R(x) in the sequel. This statement 
can be written in PA in a natural way. We assume here that A is represented 
by some formula A(x); for each n £ A the formula A[n) is provable in PA; for 
each n £ A the formula -iA(n) is provable in PA; finally, we assume that the 
cardinality of A is known in PA (the corresponding formula is provable). 

Then we are allowed to pick an integer n at random inside A (uniformly), 
and add the formula R(h) as a new axiom. This step can be repeated several 
times. We have to pay 8 for each operation until the initial capital e is exhausted; 
different operations may have different values of 8. Note that the axiom added 
at some step can be used to prove the cardinality bound at the next stepsQ 

Our previous examples are now special cases: A is the set of all n-bit strings 
(formally speaking, the set of corresponding integers, but we identify them with 
strings in a natural way), the formula R{x) says that C(x) > n — c, and 8 = 2~ c . 

In this setting we consider proof strategies instead of proofs. Normally, a 
proof is a sequence of formulas where each next formula is either an axiom or is 



As usual, we should agree on the representation of Kolmogorov complexity function C 
in PA. Wc assume that this representation is chosen in some natural way, so all the 
standard properties of Kolmogorov complexity are provable in PA. For example, 
one can prove in PA that the programming language used in the definition of C 
is universal. The correct choice is especially important when we speak about proof 



lengths (Section 2.3 \ 



Actually this is not important: we can replace previously added axioms by conditions. 
Assume for example that we first proved (Vis £ A)R(x), then for some randomly 
chosen n we used A(n) to prove (VVy G B)S(y), and finally we added S(m) for some 
random m G B. Instead, we can prove without additional axioms the statement 
(VVy £ B) (A(n) — > S(y)), and add A(n) — > S(m), which gives the same result, since 
A(n) is added earlier in the proof. 
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obtained from previous ones by some inference rule. Now we have also random 
steps where we go from the formula (V r x e A)R(x) to some R(n) for randomly 
chosen n <G A. Formally a proof strategy is a finite tree whose nodes are labeled 
by pairs (T,S), where T is a set of formuli (obtained so far) and 5 is a rational 
between and 1 (the capital at that stage of the process). Nodes can be of two 
types: 

— Deterministic nodes only have one child. If (T, S) is the label of a determin- 
istic node, the label (T 1 , 5') of its child has 5' = 5 (no capital is spent), and 
T' = T U {tp} where i\) is some statement which can be obtained from T 
using some inference rule. 

— Probabilistic nodes have several children. If (T, 6) is a probabilistic node, the 
set T contains a formula (V r n e A) R(n) where r < 6, the node has as many 
children as elements of A, and the labels of the children are (T\j{R(n)}, S—t) 
where n ranges over A. 

The root of the tree has label (PA, e), where PA is the set of axioms for Pcano 
arithmetic, and e is the initial capital. 

We will often identify a proof strategy tt with the corresponding probabilistic 
process that starts from the root of the tree, and at every probabilistic node 
chooses uniformly a child of the current node, until a leaf is reached. The logical 
theory T built at the end of the process (when a leaf is reached) is, in this 
context, a random variable. We call it the theory built by tt. 




Fig. 1. A proof strategy represented as a tree 

Given a proof strategy tt and a formula ip, we consider the probability that (p 
is provable by tt, i.e., the total probability of all leaves where ip appears. 

One can consider also a compressed version of the proof where we omit all 
non-branching steps. In this version each vertex is labeled with some theory; the 
root has label PA; at a non-leaf vertex some formula (V^a; e A)R(x) is provable 
in the corresponding theory, and at its sons the formulas R(n) for different n € A 
are added to that theory. The probability of ip being provable by this strategy 
is the probability of leaves where ip is provable. 

We can also inductively define the relation T \^ e p ip which means that there 
exists a randomized proof starting from T with capital e > that proves <p with 
probability at least p. The inductive steps are: 
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tp 



for every e > and p £ [0, 1]; 



for every e > 0; 



T h (Wgx £ A) R(x) {T,R(n)^ p - s y} 



if 8 < e, < p < Pil* A 



It is easy to see that this inductive definition is equivalent to the original 
one: T tp if and only if there exists a proof strategy with initial capital e that 
proves tp starting from T with probability p or more. Indeed, if T h tp, then every 
capital e > is enough to prove <p, we do not need randomized steps; everything 
is provable with probability at least 0; if T h (V52; £ A) R(x), then we can start 
the proof from T using a randomized step, and the probability to prove (p is the 
average of the probabilities p n to prove it starting from the enlarged theories 
using the remaining capital e — S. 

On the other hand, having a proof strategy, we may use the backward induc- 
tion (from leaves to the root) to establish the |~| 99-relation for all the nodes of 
the tree (for current capital e and the probability p to prove tp starting from the 
current vertex, or any smaller number). 

Having all these equivalent definitions, we nevertheless consider the first def- 
inition of a proof strategy as the main one. This is important when we speak 
about the length of the proof. Note also that we do not assume here that a proof 
strategy is effective in any sense. 

The following theorem says that this procedure can indeed be trusted: 

Theorem 1 (soundness) Let n be a proof strategy with initial capital 6. The 
probability that the theory T built by 7r contains some false statement is at most 5. 

This theorem has the following immediate corollary. 

Corollary 2 Let if be some arithmetical statement. If the probability to prove tp 
for a proof strategy n with initial capital e is greater than e, then tp is true. In 
other words, PA |~f; tp for p > e implies that tp is true. 

Proof (of Theorem^. This theorem is intuitively obvious because a false state- 
ment appears only if one of the "bad events" has happened (a false axiom was 
selected at some step), and the sum of probabilities of all bad events that happen 
along some branch is bounded by e. However, this argument cannot be under- 
stood literally since different branches have different bad events (after the first 
branching point). 

So let us be more formal and say that a node of label (T, S) is good if all 
statements in T are true and 60c? otherwise. We prove the following property 
by backward induction: (£) if a node has a label (T,S), either (a) it is bad, or 
(b) it is good and the probability that, starting from that node, one will reach a 
bad node is smaller or equal to S. Backward induction means that we prove this 
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property for the leaves (base case) and then prove that if it holds for all children 
of a node, it also holds for that node. It follows immediately that the property 
holds for all nodes of the tree, hence it holds at the root, which is what we want 
(since the root is good). 

Here the base case is immediate: all leaves have the property (#); there is 
nothing to prove. Now suppose that we have a node u of label (T, 8) such that all 
of its children have the property (Jit). If u is a deterministic node, there is again 
nothing to prove, as it is easy to see from the definition that a deterministic 
node has the (Jit) property if and only if its child does. If u is probabilistic, 
let (V T x G A)R(x) be the formula in T associated to the probabilistic choice at 
node u. If u is bad, we are done (the property (ft) automatically holds at u), so let 
us assume it is good. This means in particular that the formula (V r ir G A)R(x) 
is true, which means that u has a most a fraction t of bad children. By the 
induction hypothesis, starting from any good child of u, the probability to reach 
a bad node is at most 8 — r. Thus the probability to reach a bad node starting 
from u is at most 

r + (1 - t)(8 -t) = 8-t(5-t)<5 
(the last inequality uses the fact that 8 > r). □ 

2.2 Random axioms are not useful 

As Chaitin's theorem shows, there are proof strategies that with high probability 
lead to some statements that are true but non- provable (in PA). However, the 
situation changes if we want to get some fixed statement, as the following theorem 
shows: 

Theorem 3 (conservation) Let ip be some arithmetical statement. If the prob- 
ability to prove ip for a proof strategy tt with initial capital e is greater than e, 
then ip is provable (in PA without any additional axioms). 

Formally, Theorem [3] is a stronger version of Corollary [2] but the message 
here is quite different: Corollary [2] is the good news (probabilistic proof strategies 
are safe) whereas Theorem [3] is the bad news (probabilistic proof strategies are 
useless). 

Proof. Let ip be a fixed statement that is not provable in PA. We say that a node 
u is strong if it has label (T, 8) with T h ip, and is weak otherwise. We again 
use a proof by induction, and we prove that each node u has the property (<)): 
either (a) u is strong or (b) u is not strong and the probability that starting 
from u one hits a strong node v is bounded by 8. 

Again, the fact that leaves all have the property (()) is immediate. Now 
suppose that we have a node u of label (T, 8) such that all of its children have 
the property (0). If u is a deterministic node, there is again nothing to prove, 
as it is easy to see from the definition that a deterministic node has the ((}) 
property if and only if its child does. If u is probabilistic, let (V T x G A)R(x) be 
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the formula in T associated to the probabilistic choice at node u. If u is strong, 
we are done (the property (<)) automatically holds at u), so let us assume it is 
not. Let p be the probability that one hits a strong node starting from u. Let 
us show that the fraction of strong nodes among the children of u is at most r. 
Let xi,...,Xt be the elements of A corresponding to the strong children of u. 
For each i = 1, . . . , £ we have T U R(xi) h ip, or equivalently T h R(xi) — > <p by 
definition of strong vertices. Therefore, 

T h [R(x r ) V R(x 2 ) V ... V R(x t )} -> ip. 

If f > t#A, this fact, together with the assumption that (V T x € A)i?(a;), entails 
T h y», so we get a contradiction. Thus the fraction of strong children of u is at 
most r. Moreover, for any weak child v the induction hypothesis tells us that 
the probability to hit a strong node starting from v is at most S — r. Thus the 
total probability to hit a strong node starting from u is at most 

t+(1-t)(5-t) < 6. 

□ 

2.3 Polynomial size proofs 

The situation changes drastically if we are interested in the length of proofs. The 
argument used in Theorem [3] gives an exponentially long "conventional" proof 
compared with the original "probabilistic" proof, since we need to combine the 
proofs for all terms in the disjunction. (Here the length of a probabilistic proof 
strategy is measured as the length of the longest branch; note that the total size 
of the proof strategy tree may be exponentially larger.) Can we find another 
construction that transforms probabilistic proof strategies into standard proofs 
with only polynomial increase in length? Probably not; some reason for this is 
provided by the following Theorem |4j 

Theorem 4 If every probabilistic proof strategy tt can be transformed into a 
deterministic proof whose length is polynomial in the length of tt, then the com- 
plexity classes PSPACE and NP coincide. 

Proof. It is enough to consider a standard PSPACE-complete language, the lan- 
guage TQBF of true quantified Boolean formulas. A standard interactive proof 
for this language (see, e.g., |Sip96| ; we assume that reader is familiar with that 
proof) uses an Arthur-Merlin protocol where the verifier (Arthur) is able to per- 
form polynomial-size computations and generate random bits that are visible 
to the prover (Merlin) but cannot be corrupted by him. The correctness of this 
scheme is based on simple properties of finite fields. This kind of proof can be 
easily transformed into a successful probabilistic proof strategy in our sense. To 
explain this transformation, let us recall some details. 

Assume that a quantified Boolean formula <p starting with a universal quan- 
tifier is given. This formula is transformed into a statement that says that for 
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some polynomial P(x) two values P(0) and P(l) are equal to 1. This polynomial 
is implicitly defined by a sequence of operations. To convince Arthur that it is 
indeed the case (i.e., that P(0) = P(l) = 1), Merlin shows P to Arthur (listing 
explicitly its coefficients; there are polynomially many of them). In other terms, 
Merlin notes that the formula 

ftx(P(x) = P{x))} [P(0) = I A P(l) = I] 

and therefore 

[y x (p(x) = p(x))\ -> Lp 

are true (and provable in PA). Here P(x) is the polynomial P defined as the 
result of the sequence of operations, while P{x) is the explicitly given expression 
for P. Then Merlin notes that the implication 

[P(r) = P(r)] -> Vx{P{x) = P{x)) 

is true for most elements r of the finite field and this fact is provable in PA 
(using basic results about finite fields), so such an implication can be added as a 
new axiom, and it remains to convinve Arthur that P(r) = P{f). Continuing in 
this way, we get a probabilistic proof strategy that mimics the interactive proof 
protocol for TQBF0 

This construction gives for each TQBF a probabilistic proof strategy (in the 



sense of Section 2.1) of polynomial length that uses some small initial capi- 
tal £. Assume that any probabilistic proof strategy can be transformed into a 
conventional proof in PA of polynomial size. Then this conventional proof is a 
NP-witness for TQBF, so PSPACE = NP. □ 



3 Non-randomly chosen axioms 

3.1 Full information about complexities and its axiomatic power 

In this section we study in general the axiomatic power of the axioms u C(x) > n" . 
Let us start with a simple question: assume that we add to PA all true statements 
of this form as axioms. What theory do we get? Note that these statements 
are Ili-formulas, so their negations are existential formulas and are provable 
when true. So with these axioms we have the full information about Kolmogorov 
complexity of every binary string. The axiomatic power of this information has 
a simple description: 

Theorem 5 If one adds to PA all true statements of the form "C(x) > n", the 
resulting theory proves all true Hi -statements. 

8 This argument assumes that our version of PA allows us to convert the argument 
above into a polynomial-size randomized proof. We do not go into these technical 
details here. 
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Proof. The proof is an adaptation of the proof that 0' is Turing-reducible to the 
function C (see Proposition 2.1.28 in [Nie09j ). 

Consider the upper bound C (x) for complexity that appears if we restrict 
the computation time for decoding by t. The value of C*(:r) is computable given 
x and t. As t increases, C'(x) decreases (or remains the same), and the limit 
value is C(x). 

For some number N, we consider the minimal value of t such that G t {x) 
reaches C(x) for all strings x of length at most N. Let us denote this value 
by B(N). Let us show that every terminating program (without input) of size 
N — O(logA) terminates in at most B(N) steps. Indeed, a terminating program 
p can be considered as a description of a natural number, namely, the number 
of steps needed for its termination. Assume that this number is greater than 
B(N). Then, knowing p and N, we can compute the table of true complexities 
of all strings of length N (since for these strings C coincides with C*, where t 
is the number of steps needed for p to terminate). Therefore we can effectively 
find a string of length N that has complexity at least iV" (such a string always 
exists) . This leads to a contradiction if the total information in p and iV" is less 
than N — 0(1); and this is guaranteed if the size of p is less than TV — 0(log N). 
(Note that we need 0(log N) additional bits to specify N in addition to p.) 

This shows that the information about the complexities of strings of length N 
is enough to solve the halting problem for all programs of size N — 0(\ogN). So 
the halting problem is decidable with C as oracle (the result mentioned above). 

It remains to note that this argument can be formalized in PA. Using axioms 
that guarantee the complexity of all strings up to length N, we can prove the 
value of B(N). Also we can prove that each program p of size at most N — 
O (log TV) terminates in B(N) steps or does not terminate at all. Therefore, we 
can prove that p does not terminate if it is the case. It remains to note that 
every ITi-statement is provably equivalent to non-termination of some program 
(namely, the program looking for a counterexample for this statement). □ 

Remark: 

1. To be precise, in the last theorem we need to specify how the function C is 
represented by an arithmetic formula. It is enough to assume that C is defined 
as the minimal length Cd(x) of the program that produces x with respect to an 
interpreter D, where the interpreter D is provably optimal, i.e., for all D' , there 
exists a constant c such that 

PAhVx [Cij(ar) < C D ,(x) + c]. 

We always assume that C is represented in this way. 

2. A sceptic could (rightfully) claim that the exact value of Kolmogorov 
complexity can encode some additional irrelevant information (in particular, 
about IT-statements' truth values). For example, it is not obvious a priori that 
the theory obtained by adding the full information about the values of C does not 
depend on the choice of optimal programming language fixed in the definition 
of Kolmogorov complexity. Also one may ask whether a similar result is true for 
prefix complexity. 
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To address all these questions, we may consider weaker axioms. Assume, for 
example, that for every x the complexity of x is guaranteed up to a factor 2, i.e., 
some axiom C(x) > c x is added where c x is at least half of the true complexity 
of x. Is it enough to add these axioms for all x to prove all true Ili-statements? 
The answer is positive, and a similar argument can be used. Let B(N) to be the 
minimal t such that C'(x) < 2c x for all strings x of length N. Then, knowing 
N and any number t greater than B(N), we can compute C (x) for all strings x 
of length N, and get a lower bound c x > C t (x)/2 for C(x). Taking x such that 
C*(a;) > N for this t (such an x exists since C* is an upper bound for C), we 
find a string of length N that has complexity at least N/2 (such a string always 
exists) . 

Therefore, every program of length less than N/2 — O (log AT) terminates 
before B(N) steps, otherwise we could use the termination time to get a contra- 
diction. Knowing (from the added axioms) the value of C(x) up to factor 2, we 
can prove for some t that this is the case, and therefore prove non-termination 
for non-terminating programs (as before). 

3.2 Complexity of Ili-statements 

Let us introduce the notion of complexity of a (closed) ITi-statement that can 
be considered as a formal version of the ideas of [CC09a CC09bJ. 

Let U(x) be a ITi-statement in the language of PA with one free variable x 
(for simplicity we identify strings and natural numbers, so we consider x as a 
string variable). We say that U is universal if for every closed Ili-statement T 
there exists some string t such that 

PA I- [T&U(t)]. (*) 

For a universal U we can define the [/-complexity of a closed Ili-statement T 
as the minimal length of t that satisfies (*). As usual, the following statement is 
true: 

Theorem 6 There exists an optimal H\- statement U(x) such that the corre- 
sponding complexity function is minimal up to 0(1) additive term. 

Proof. Let V(p, x) be a Ili-statement with two parameters such that for every 
Ili-statement W(x) with one parameter there exists some string p such that 

PA h Vie \W(x)*>V(p,x)]. 

Then we let U(px) — V(p, x) where p is some self-delimiting encoding of p, e.g., 
p with doubled bits and 01 added at the end. □ 

We fix some optimal universal statement U (x) and measure the complexity 
of Ili-statements with respect to U{x); the complexity function is well defined 
up to O(l) additive term. 

Evidently, the complexity of all false Lli-statements is the same constant; the 
same is true for all provable Lli-statements. Since one can construct a computable 
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sequence of non-equivalent (in PA) closed Ili-statements, it is easy to see that 
the number of non-equivalent statements of complexity at most n is O (2™). 

Informally speaking, IT-statement of complexity at most n are statements 
about non-termination of programs of complexity at most n, and their provable 
equivalents. (The exact formulation allows 0(l)-change in n.) 

3.3 The axiomatic power of the complexity table up to length n 

Now we can describe the axiomatic power of the complexity table up to some n: 
it is roughly equivalent to all true Ili-statements of complexity at most n. This 
informal description requires several clarifications. 

First, the complexity (both for strings and for Ili-statements) is defined only 
up to 0(1) additive term. To take this into account, we say that two sequences 

Ti,T 2l . . . and S\, S2 of theories are 0(1)- equivalent if T„ C S n+C and S n C 

T n+C for some c and all n. (The inclusion U C V means that every theorem of U 
is a theorem of V; we could also write V h U .) 

Second, we need to specify what we mean by a "complexity table up to n" . 
There are several possibilities. We may consider axioms that give full information 
about complexities of all strings of length at most n. Or we may consider axioms 
that specify the list of all strings of complexity at most n and complexities of all 
these strings. All these variants work; one can even consider one specific string 
of length n, the lexicographically first string r n of complexity at least n, and 
consider a theory with only one axiom C(r n ) > n. In the following theorem 
we consider these "minimal" and "maximal" versions and show that they are 
essentially equivalent. 

Theorem 7 The following sequences of theories are O(l)- equivalent: 
A n ■ C(r„) > n; 

B n : the list of all strings that have complexity at most n, and full information 
about their complexities {for each string of complexity at most n we add an 
axiom specifying its complexity, and also add an axiom that says that all 
other strings have complexity greater than n); 

0„ : all true Hi- statements of complexity at most n. 

In particular, adding axioms C(r„) > n for infinitely many n, we get PA plus 
all true Ili-statements. 

Proof. It is easy to see that B n implies A n . 

To prove that C n + C implies A n , consider a Ili-statement Rand(r) which says 
that C(r) > |r| (where \r\ is the length of r). Then A n is Rand(r„) and therefore 
A n consists of a ITi-statement of complexity at most n + O(l). 

It remains to show that A n implies B n - C and C„_ c . This is similar to the 
proof of Theorem [5] 

Lemma 1. Let A(p) be some algorithm with input p. There exists some c such 
that for every n and for every input string p of length at most n — c such that 
A{p) does not terminate, the theory A n proves non-termination of A(p). 
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This lemma shows that A n implies B n _ c since A n allows to prove non- 
termination of all non-terminating programs of size at most n — c, so the true 
values of complexities can be guaranteed if they do not exceed n — c. Similarly, 
A n implies C n _ c , since ITi-statement U(x) claims that the program with input x 
that searches for the counterexample to this statement, never terminates. 

It remains to prove the lemma. 

We know that C(r n ) > n and for all preceding strings y of length n we have 
C(y) < n. Consider the minimal t such that C'(y) < n for all these y. We denote 
this value by B(n). Let us prove that for a suitable c (that does not depend on 
n) and for every string p of length at most n — c the computation A(p) either 
does not terminate or terminates in at most B(n) steps. 

Every string p determines the number of steps needed for the termination 
of A(p). Knowing p and n, we find this number t(p) and then take the first y 
of length n such that C*^(y) > n. If t(p) exceeds B(n), then we get r n . On 
the other hand, for every p such that A{p) terminates and for every n we get 
some string of length n whose complexity does not exceed C(p, n) + O(l), where 
C(p, n) stands for the Kolmogorov complexity of the pair (p, n). Note that C(p, n) 
is bounded by \p\ + 0(log(n — p)) + 0(1) (we add to p a prefix that is a self- 
delimiting description of n — p) . If c is large enough, for every string p of length 
n-cor less we get a contradiction (assuming that t(p) steps are not enough for 
the termination of A(p)). So all computations A(p) for \p\ < n — c terminate in 
B(n) steps. 

This reasoning can be formalized in PA. Having C(r n ) > n as an axiom, we 
can prove that r n is the first string of complexity at least n: all the preceding 
strings have a short description, and we can wait long enough to confirm that 
it is indeed the case. Then we can prove the value of B{n) and prove that 
every computation A(p) for short p either terminates in B(n) steps or does not 
terminate at all. □ 

Remark: To be closer to the initial framework, we may fix some constant c 
and for every n consider the lexicographically first string y of length n such that 
C(y) > n — c. Then we add to PA the statement C(y) > n — c for this y and get 
a theory A c n . For this theory the statement of Theorem [7] is also true (and can 
be proved in the same way); of course, the constant in 0(l)-equivalence depends 
on c. 

This theorem leaves the following question open. 

Question 1. Characterize precisely the true ITi statements of complexity n which 
prove all true ITi statements of complexity n — 0(1). 

It is natural to ask whether the power of theories of Theorem [7] strictly in- 
creases as n increases. It is indeed the case, as the following "generalized Chaitin's 
theorem" shows: 

Theorem 8 There exists some c such that no statement of the form C(x) > n+c 
can be proved in A n for any n. 
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(Theorem [7] allows us to replace A n in the statement by B n or C n .) 

Proof. Consider the following program: given the string r n and some d, it starts 
to look for PA-consequences of A n saying that C(y) > n + d for some y and d. 
(Note that n can be reconstructed as the length of r n ). When (and if) such a y 
is found, it is the output of the program. 

If the program terminates, the complexity of the output is at most n + 
O(logd); on the other hand, it is at least n + d, so for all such cases we have 
n + 0(log d) > n + d, and therefore d < 0(1). □ 

This theorem has an interesting consequence that can be formulated without 
any reference to complexities. Note that for every number n one can write an 
arithmetic formula with one parameter x that in provably equivalent to x = n 
and has length O(logn). (The standard formula with successor function has 
length 0(n), but we can use binary representation.) Using this fact, it is easy to 
show that complexity of a Ili-formula can be defined up to 0(l)-factor as the 
minimal length of provably equivalent formula. We know that one axiom of A n 
is enough to prove all axioms of C n . In terms of lengths we get the following 
statement: 

Theorem 9 For every n there exists a true Hi-formula of size 0(n) that implies 
in PA every true Hi- formula of size at most n. 

3.4 Not all random strings are equally useful 

Theorem [7] shows that it is enough to claim the incompressibility of one properly 
chosen string r n to derive all true Ili-statements of complexity n— 0(1). However, 
this is a very special property of this incompressible string, as we see in this 
section. 

Recall that there is 0(2 n ) incompressible strings of length n. Indeed, there 
are at most 2™ — 1 programs of length less than n, but some of them are needed 
to produce longer strings of complexity less than n. There is 0(2") such strings; 
for example, one can consider strings xx where a; is a string of length n — O(l). 
So we have 0(2 n ) incompressible strings of length n. 

The following result shows that we can add many of them and still have 
a theory that is weaker that theories of Theorem [7j We formulate this result 
for the more general case of c-incompressible strings. (A string x is called c- 
incompressible if C(x) > \x\ — c.) 

Theorem 10 Fix a constant c. Let m(n) be a computable provable lower bound 
for the number of c-incompressible strings of length n. For example, we can let 
m(n) = 2™ — 2 n ~ c , or m(n) = e2" for c = 0. 

Let f be a formula not provable in PA. Then it is possible to choose m(n) 
strings of length n (for each n) in such a way that with PA with axioms C(x) > 
n — c for all these x does not prove tp. 

(The strings added are necessarily c-incompressible, otherwith the theory is 
inconsistent and therefore proves <p.) 
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Proof. We choose m(n) strings of length n sequentially for n = 1, 2, ... in such a 
way that tp remains unprovable after each step. Assume that we did this for all 
lengths smaller than n and ip is unprovable in the resulting theory. Let t = m(n). 
We want to add t axioms of the form C(x) > n — c so that cp remains unprovable. 
Imagine that this is impossible. Then for every t strings x\, . . . , Xt one can prove 
(in the current extension of PA) that 

(G(xi) > n - c) A (G(x 2 ) > n - c) A . . . A (C(x t ) >n-c)^tp. 

Note that this is the case for every x\, . . . ,x t , even if one of the statements 
C(xi) > n — c is false, since then the left hand side is provably false. Therefore, 
we can also prove 

V (C(xi) > n - c) A (C(x 2 ) > n - c) A . . . A {C(x t ) >n-c)^<p, 

Xl,...,Xt 

and the left hand side is provable in PA due to the lower bound m(n). So we 
conclude that ip was already provable before the induction step, contrary to the 
induction assumption. □ 

Remark: In this proof it is important that m(n) is not just a computable lower 
bound for the number of c-incompressible strings, but a provable lower bound. 
Without this condition (to is only assumed to be a computable lower bound), 
one can prove a weaker result: it is possible to add m(n) many axioms of type 
"C(x) > n" (for all n) in such a way that the resulting theory does not prove 
all true ITi-statements. The reason is that the set C consisting of sequences of 
statements of type "C(x) > n" which are consistent with PA and contain at least 
m(ri) elements for all n is (modulo proper encoding) a subset of 2^ (the set 
of infinite binary sequences). Hence C must contain an element S (sequence of 
statements) which does not compute the halting set 0' (this follows from the 
low basis theorem of Jockusch and Soare |JS72j ). However, any theory proving 
all true Hi-statements can Turing-compute C and therefore can compute 0'. So 
adding the sequence S does not allow us to prove all true Hi-statements. This 
type of argument combining computability theory and logic will be an important 
tool in Section 31 



3.5 Usefulness of random axioms is fragile 

Theorem [7] says that for carefully chosen r n the incompressibility axioms are 
rather strong (imply all true Hi-statements) while for many other incompressible 



strings this is not the case (Theorem 10). 



In this subsection, we show that the usefulness of the well-chosen incom- 
pressibility axioms is not solely due to the strings r n themselves, but also to the 
accuracy of the axiom. Namely, we prove the following: 

Theorem 11 Let (r„) be a sequence of strings such that \r n \ = n and C(r„) > 
n — 0(1). There exists a constant c such that the axioms "C(r n ) > n — clogn" 
(for all n) do not prove all true Hi- statements. 
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The first step of the proof of this theorem is the following lemma. 



Lemma 2. Let (r„) be a sequence of strings such that C(r n ) > n. Suppose 
7 £ 2" computes the sequence (r n ), and Y is uniformly Martin-Ldf random 
with respect to some oracle Z £ 2". Then C z (r n ) > n — clogn for some c and 
for all n. 

Proof. To prove the lemma, it is enough to proof the following inequality: 

C(r) < C z (r) + C Y (r) + d z {Y) + 0(logG(r)), 

where d z (Y) stands for the expectation-bounded randomness deficiency of Y 
with oracle Z. (This deficiency was introduced by Levin and Gacs, see [BGH + lT] 
for details; d z (Y) is finite if and only if Y is Martin-L6f random with oracle Z. 
In this subsection we assume that the reader is familiar with the definition and 
properties of randomness deficiency.) 

Indeed, for r — r n the value of C(r) is at least n; the value of C (r) is 
O(logn), since r n is computable with oracle Y from n; the deficiency is finite; 
finally, the last term is O(logn). So we get C z (r n ) > n — O(logn). 

It remains to prove the inequality above. We may use prefix complexity K 
instead of plain complexity C, since our inequality has logarithmic precision 
anyway (all complexities are bounded by C(r) and we have 0(logC(r)) term). 
So we need to prove that 

K(r) < K z (r) + K Y (r) + d z (Y) + 0(logK(r)), 

For given r we consider all infinite sequences Y that (being used as oracle) 
decrease the (prefix) complexity of r from K(r) to K (r). The set W of all such 
sequences is effectively open (since only finite information about an oracle can be 
used). It contains Y (by construction) and has small measure: we will show that 
its measure is 0(2~ s ) where s — K(r) — K Y (r) is the decrease in complexity. 
To describe W, it is sufficient to specify r and K Y (r), so the complexity of 
W given Z is bounded by K z (r) + 0(logK(r)). The last step: if an effectively 
open set W of measure 2~ p has description of complexity at most q, all its 
elements have deficiency at least p — 0(\ogp) — q. (We apply this observation 
with p — K(r) — K Y (r) and q = K z (r) + (9(logK(r)), using Z as an oracle.) 
Let us prove two statements used in this argument. 

(1) Let x be a string. The probability that (uniformly) random oracle Y de- 
creases the prefix complexity of x at least by some s, does not exceed 0(2~ s ). 

Assume that K(x) — t. Let us first (uniformly) generate a random oracle U 
and then generate a string according to a priori distribution m(x\U) using this 
oracle. Then we get a lower semicomputable discrete semimeasure on strings, 
and it is bounded by (oracle- free) a priori probability m(x). The probability to 

9 The idea of proving this lemma came to us by reading an early draft of Higuchi et 
al.'s paper |HHSY| where it was stated as an open problem; by the time we wrote 
up our proof and informed them of the solution, Higuchi et al. had independently 
solved it. 
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get x in such a process is at least f2(p2~(*~ s )), where p is the probability to get 
an oracle U that decreases complexity of x from t to t — s (or more), since the 
probability to get x using such an oracle is Q,(m(x\U)) = Q(2~(*~ s )). Since m 
is maximal, we get ft(P 2 ~ ( *~ s) ) = 0(m(x)) = 0(2"*), so p = 0{2- s ). [Recall 
that discrete a priori probabilities m(x) and m(x|?7) are equal to 2~ K w anc j 
2-K(x|t/) respectively up to a constant factor.] 

(2) Let W be an effectively open set of measure 2~ p whose description has 
prefix complexity at most q. Then all elements ofW have (expectation-bounded) 
randomness deficiency at least p — O(logp) — q. 

To construct an expectation-bounded test, let us generate a program v for 
effectively open set V with probability m(t>), and independently an integer k 
with probability m(fc). Then let us consider the indicator function Iy that is 
equal to 1 inside V and to outside V, and multiply it by 2 k . We trim the 
resulting function in such a way that its integral (w.r.t. uniform measure in the 
Cantor space) is bounded and the function remains unchanged if the integral 
does not exceed 1. Then we add all these functions (with weights m(v) ■ m(fc)); 
the result is a test (has finite integral). On the other hand, one of the terms 
corresponds to V = W and k = p, and this term remains untrimmed due to our 
assumptions, so the test is at least 2 p m(v)m(p), and that is what we need, since 
m(u) > f7(2-«) and m(p) = 2-°( 1o sp). □ 

Corollary 12 Let (r n ) be a sequence of strings such that C(r„) > n, and letC be 
a non-empty Tl\-class. Then there exists Z G C such that C z (r n ) > n — O(logn). 

Proof. Let (r„) be such a sequence of strings. By the Kucera-Gacs theorem, 
there exists some Martin-L6f random real Y that computes this sequence. By 
the basis for randomness theorem, there exist some Z G C be such that Y is 
random relative to Z. It remains to apply Lemma [2] 



The proof of Theorem 11 now goes as follows. Let <p be a sentence not provable 
in PA. Consider the nj-class of (codes of) complete consistent extensions of 
PA U {^</?}- It is a non-empty class since PA U {^V 5 } i s a consistent computable 
set of axioms. By the above corollary, let Z be a member of this class such that 
K z (r„) > n — O(logn). Let T be the theory coded by Z. Since T is complete, 
it declares the value of Kolmogorov complexity for every string. Let Ct be 
this version of Kolmogorov complexity. It is clear that C z < + Ct since Ct is 
computable with oracle Z and satisfies the quantitative restrictions (no more 
than 0(2 k ) strings u have Ct(u) < k). Thus Cx(T n ) > n — O(logn), so for some 
c the theory T contains the sentences "C(r„) > n — clogn" for all n, and also 
contains -up. Hence the axioms "C(r„) > n — clogra" do not prove tp. □ 



3.6 Independence of random axioms 

In section [2] we added (to PA) random axioms of the form C(x) > n — c for 
several randomly chosen strings aii, . . . , x m ; we noted that if m2~ c <C 1, all the 
added axioms are true with probability close to 1. A natural question arises: will 
these axioms be independent? 
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Evidently, with positive probability they can be dependent. Imagine that we 
add axioms C(xi) > n — c and 0(2:2) > n — c for two random strings X\ and 
X2 obtained by 2n coin tosses. It may happen (with positive probability) that 
x\ — x 2 or x\ is so close to x-i that they provably have the same complexity. 
(For example, the decompressor used in the definition of C could give the same 
complexity to strings that differ only in the last bit.) Or it may happen that the 
first axiom is false, then it implies everything (its negation is provable). 

However, the probability of dependence between these two axioms is small. 
For example, consider the probability e that 

PA I- (C(«i) > n - c) (C{x 2 ) > n - c) (*) 

for a randomly chosen pair (0:1,2:2). We want to show that this probability is 
small. Indeed, we can fix x 2 in such a way that the probability of (*) for this 2: 2 
and random X\ is at least e. And Theorem [3] says that this is possible only if 
0(2:2) > n — c is provable (which implies n = O(l) due to Chaitin's theorem) or 
e < 2~ c . So for large enough n the probability of (*) does not exceed 2~ c . 

Similar results are true for other types of dependence. For example, we may 
consider three random strings x±, X2, £3 and the event 

PA h (0(2:0 > n - c) => (0(2:2) > n - c) V (C(x 3 ) >n-c) 

This event also has probability at most 2~ c for large enough n. Indeed, the right 
hand side implies that 0(2:2, X3) > n — c— 0(1) (the pair has large complexity if 
one of its components has large complexity), and we can use the same argument. 
One more type of dependence: consider the event 

PA h (0(2:1) > n - c) A (0(2:2) > n - c) (C(x a ) > n - c); (**) 

let its probability (for independent random 2:1,2:2,2:3) be e. Then for some 2:3 
the probability of this event (for random 2:1 and x 2 ) is at least e. Then we can 
use the same argument as in the proof of Theorem [3j We can prove in PA that 
the fraction of pairs (2:1,2:2) such that left hand side of the implication is false, 
does not exceed 2 • 2 _c (each of the two conjuncts is false with probability at 
most 2~ c ). Therefore, if the fraction of pairs (xi,x 2 ) such that (**) happens is 
greater that 2 • 2 _c , we can form a disjunction and then prove 0(2:3) > n — c 
without additional axioms. 

Similar reasoning can be applied to other kind of dependencies, so three 
random axioms 

0(2:1) > n — c, 0(2:2) > n — c, 0(2:3) > n — c 

are independent with probability 1 — 0(2~ c ) for sufficiently large n. Here by 
the independence of the statements Ti,T 2 , T3 we mean that each of 8 possible 
combinations of the form 

HTi, ht 2 , ht 3 
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(with or without negations) is consistent with PA. 

A similar result (with the same proof) is true for any constant number of 
randomly chosen axioms: 

Theorem 13 Fix some constant m. Let x\, . . . , x m be m independent uniformly 
randomly chosen strings of length n, and consider m statements 

C(xi) > n — c, C(x 2 ) > n — c, ... , C(x m ) > n — c. 

For large enough n they are PA-independent (all 2 m combinations of these state- 
ments, with negations or not, are consistent with PA) with probability 1 — 0(2~ c ), 
where the constant in O-notation depends on m but not on n and c. 

3.7 The strange case of disjunction 

So far, the results we have established about the axiomatic power of Kolmogorov 
complexity were related to its computability-theoretic properties. In this section, 
we present an interesting example of a setting where the axiomatic power of a 
family of axioms is (in some sense) weaker than its computational power. This 
family consists of axioms of type "C(x) = n\ V C(x) = where one axiom 
of this type is given for each x. The following result of Beigel et al. tells us 
that having access, for each x, to a pair of possible values of C is enough to 
reconstruct C: 

Proposition 14 ( |BBF+06j ) Let f : 2 <UJ ->■ N 2 be a function such that for 
each x, if fix) — (711,712) then C(x) G {1x1,1x2}- Then the function f Turing- 
computes the function C. 

Based on that this results and the results presented so far in the paper, 
one could conjecture that if for each x we are given a (true) axiom of type 
"C(x) = Tlx V C(x) = n2 ", then we are able to prove all true statements of type 
"C(x) = n" . Surprisingly, this turns out to be false. 

Theorem 15 Let if be a formula which is not provable in PA. There exists a 
family F of true axioms of type "C(x) — riiVC(x) = ri2 where one such axiom 
is given for any x, such that PA U F does not prove ip. 

Proof. Since tp is not provable in PA, the theory PA + -tip is consistent and has 
some model SOT. In this model a formula that defines Kolmogorov complexity 
function, determines some function £: 971 — > 971. Note that for standard natural 
numbers n <E 971 the values £(n) are standard (since G(ar) < logx + 0(1) is 
provable in PA). The value £(x) may coincide with C(ar) or they may differ (in 
this case €~{x) is smaller, since a standard description for x remains valid in all 
models of PA). Then we add the axioms 

C(x) = C(xj V C(x) = £(x) 

(containing numerals both for true value C(x) and 971- value £(x), for all strings 
x) to PA. Both the standard model and 97t are models of this theory. Therefore, 
these axioms are true in the standard model but do not imply tp. □ 
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3.8 Axioms on conditional complexity 

What happens if we switch to conditional complexity and add true statements 
of the form C(x\y) > nl The unconditional complexity is a special case of con- 
ditional one, so if we add all true statements of this form, we can prove all true 
III statements. 

However, there is an important difference between conditional and uncondi- 
tional case. The following theorem shows that now we do not need unbounded 
values of n to get all true ITi-statements. 

Theorem 16 There exists some constant c such that PA together with all true 
statements of the type C(x\y) > c proves all true Hi statements. 

Proof. Strangely, the proof is quite indirect here. It use the results from recursion 
theory about DNC (=diagonally non-computable) functions saying that (1) the 
mass problem of constructing a DNC function is equvalent to the mass problem 
"given n, construct some string of complexity at least n" , and that (2) every 
enumerable oracle that computes DNC function, is Turing complete. Since we 
need to translate these results from computation language to proof language, we 
need to reproduce them first for our special case. 

Lemma 3. There exist a constant c with the following property: having access 
to an oracle that for every string x gives us some string y such that C(y\x) > c, 
we can for every n compute a string of complexity at least n. 

To prove the lemma, fix some programming language. Given some program p 
and input u, we cannot say whether the computation of p on input u terminates. 
However, we know that if it terminates, the output will have 0(l)-complexity 
conditional to p and u. The constant on O(l) depends on the programming 
language (and on the choice of specific complexity function), but not on p and x. 
So, having an oracle that for given x produces y such that G{y\x) > c where c is 
a bigger constant, we are able for every p and u specify some y that is guaranteed 
to be different from the output of p on u (if this output exists). 

Our task is, however, more difficult: we want to construct a string of com- 
plexity greater than n, so we need this string to be different from the outputs 
of many computations (for all programs of length n or less) . This can be done 
as follows: we may assume that the outputs of computations are not strings 
but infinite sequences of strings that have only finitely many non-empty terms. 
Such "sequences with finite support" form a countable set and one can establish 
a computable one-to-one correspondence between such sequences and strings. 
Now, having finitely many computations whose outputs are such sequences, we 
construct a sequence that differs from the first computation in the first term, 
from the second computation in the second term, etc. This can be done using 
the oracle, as we have seen above. Lemma is proven. □ 

Returning to Theorem |16| consider the following process that uses an ora- 
cle providing exact values for C(-|-). Given some n, we construct the string of 
complexity greater than n as described above. When a string different from the 
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output of some program p on some input u is needed, we take the first string y 
such that C(y\u,p) exceeds some constant c (large enough and fixed in advance). 
Finally we produce some string r„ that has complexity greater than n. 

While looking for the first strings (denoted by y in the previous paragraph) 
we observe that all the previous strings have complexity (with required condi- 
tions) less than c. Consider the time t needed to establish this fact, i.e., some t 
such that not only true conditional complexity C but also its upper bound C* 
(complexity with time bound t) becomes less than c. Let us show that every 
number greater than t has complexity at least n — O(logn). Indeed, knowing 
some number t' > t and n, we do not need the oracle anymore, since we can use 
C* instead of C and get the same strings y. This procedure is computable, so it 
cannot increase complexity more than by O(l), and we get a string of complex- 
ity greater than n. Since we need only O(logn) bits to specify n, the number t' 
should have complexity at least n — O(logn). As before, this implies that t steps 
are enough for every terminating program (without input) of size n — O(logn) 
to terminate, otherwise this program would describe the number of steps needed 
for termination, and this number t' would be greater than t and still have small 
complexity. 

Now we need to formalize the reasoning above in PA. Having all true state- 
ments of the form C(u\v) > c as axioms, we provably know the first strings y that 
are found during the described process. (Indeed, we can also prove that previous 
strings have small complexity, since this is an existential statement.) We can 
also prove that the process intended to generate a string of complexity greater 
than n, achieves its goal. Then we can provably establish the value of t, and also 
prove that every number greater than t has complexity n — (3(log n) (with some 
specific constant in the O-notation). Finally, we prove that every program of 
size n — O(logn) that does not terminate in t steps, never terminates, therefore 
proving all true IT statements of complexity at most n — O(logn). Since n is 
arbitrary, we can prove all true IT-statements. □ 

Remark: It would be nice to find a more direct way to construct string of 
arbitrary high complexity if we know all the pairs (u,v) such that C(u\v) > c. 
One can try to start with some xo that has complexity at least c, then find 
some X\ such that C(xi|xo) > c, then x 2 such that C^l^c), Xl ) ^ c ' ^ n ^ s does 
not work because in the formula for the complexity of pair for plain complexity 
we have logarithmic error terms that can compensate c, and in the prefix version 
we also have the prefix complexity in the condition. 

4 Adding information about Martin-L6f random 
sequences 

Up to now we considered additional axioms saying that some strings have high 
complexity. In this section we want to extend PA in a different way and claim for 
some infinite sequence X that X is Martin-L6f random. Some refinements are 
needed since an infinite sequence (unlike a string) cannot be made a part of one 
axiom. There are several ways to do this. Let us consider different possibilities. 
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4.1 The theory MLR C (X) and its properties 

One natural way to express that X = xqX\ ... is Martin-L6f random is to make 
use of the Levin-Schnorr theorem, by fixing some constant c and to add axioms 
"K(X \ n) > n — c" for all n. Here K stands for prefix complexity (note that 
in the previous section we considered plain complexity which was more natural 
then, but all the results of that section also hold with K in place of C) where 
X \ n stands for the n-bit prefix xqX\ . . .x n —\ of X. We denote by MLR C (X) 
the theory PA enriched by these additional axioms. 

Let us fix X and consider MLR C (X) for different c. For small c some of the 
additional axioms can be false; then their negation is provable and MLR C (X) is 
inconsistent. As c increases, the theory MLR C (X) becomes weaker. If X is Martin- 
Ldf random, then for large enough c all statements in MLR C (X) are true and 
MLR C (AT) is consistent. Note also that all axioms of MLR C (X) are Ili-statements, 
so in the latter case MLR C (AT) is a part of the theory considered above (PA plus 
all true Ili-statements). 

Intuitively, if we believe that "X is Martin-L6f random" implies some tp, then 
we would expect tp to be provable in all theories MLR C (X). The next theorem 
shows that any such formula tp is in fact already provable in PA. 

Theorem 17 Let X be a Martin-Ldf random sequence. If tp is provable in all 
theories MLR C (X) , then tp is provable in PA. 

Proof. Let X be a Martin-L6f random sequence. We need to show that a formula 
tp provable in MLR C (X) for all c is actually provable in PA alone. Assume that 
tp is not provable; we will show that every sequence X such that tp is provable 
in all MLR C (X) is not random. This is done by constructing a Martin-Ldf test 
that covers X. For every c consider the set A c v of infinite binary sequences: 

A% = {Y | MLR C (F) h p} 

This is an effectively open set in the Cantor space (recall that each derivation 
uses only a finite number of axioms) . We claim that the uniform measure of this 
set is small. More precisely, fi(A°) < 2~ c , so A c forms a Martin-L6f test that 
covers X (if tp is provable in MLR C (X) for all c). 

To prove that claim, suppose the contrary, i.e., (J>(A^) > 2~ c . Since an effec- 
tively open set is a union of intervals, this implies that there exists an integer N 
and a set S of more than 2~ c • 2^ strings u of length N such that the formula 
MLR c (w) — > tp is provable for all u £ S, where MLR c (it) says that K(v) > \v\ — c 
for every prefix v of u. 



We can then design a proof strategy in the sense of Section 2.1 This strategy 
starts with capital 2~ c , then proves in PA that there are at least (1 — 2~ c ) • 2 N 
strings u of length N which make MLR c (u) true (this statement is used to prove 
Levin-Schnorr theorem relating Martin-Ldf randomness and prefix complexity, 
and is provable in PA). Then the strategy picks a string u of length N at random 
and adds the axiom MLR c (u). By assumption on the cardinality of S, with 
probability greater than 2~ c we can prove tp. By Theorem [3| this would mean 
that tp is already provable in PA. □ 
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However, this theorem does not exclude the possibility that for some X and c 
the theory MLR C (X) is powerful, even powerful enough to prove all true re- 
statements. The next theorem rules out this possibility. 

Theorem 18 If the theory MLR C (X) is consistent, it does not prove all true 
Hi -statements. 

Proof. We start by the following observation. Suppose that for some random X 
and for some c theory MLR C (X) proves all Ili-statements while being consistent. 
Then, using X as an oracle, one can enumerate all theorems of MLR C (X), so with 
oracle X one can decide which Ili-statements are true and which are false (false 
Ili-statements can be enumerated without oracle), i.e., X computes the halting 
problem. So we now can make use of this additional information. 

Let X be a random sequence such that K(u) > \u\ — c for all prefixes of X, 
and X computes 0' (the halting problem). Identifying complete arithmetical 
theories with infinite sequences (where the value of the n-th bit is 1 if and only 
if the n-th arithmetical formula, for some canonical order, is in the theory), we 
see that the set of complete and consistent extensions of PA is a non-empty 
class. The Turing degrees of the elements of this class are commonly referred 
to as PA-degrees. By the low basis theorem for randomness (see DHMN05, 
Proposition 7.4] or |DH1Q[ Theorem 8.7.2]), there exists a complete and consis- 
tent extension T of PA such that X is Martin-L6f random relative to T. Since 
T is complete and consistent, for each n there is a unique value fc„ such that 
T h "K(n) = fc„". Let H : N — > N be the function n h> k n . Let us call a the 
Turing degree of T and let us make three observations. First H is computable 
relative to a. Second, we must have H(n) < K(n): indeed, if K(n) < H(n) = fc„, 
then the statement "K(n) < k n " is true and therefore provable in PA, and a for- 
tiori in T, a contradiction with the definition of k n . Third, since it is provable in 
PA that " J2 n 2~ K(n) < 1" , this must also hold in T and therefore £ n 2" H ( n ) < 1 
(Note that these inequalities can be considered as statements about finite sums.). 
Since H is computable relative to a, Levin's coding theorem indicates that 
K a < H+0(1). To sum up: K a < H+0(1) < K + 0(1). Since X is a-random and 
H > K a — 0(1), we know that H(X \ n) — n — > +oo (here we use the fact that ev- 
ery a-Martin-L6f random X has the property K a (X \ n) — n — > +oo, see |Ch a87 
for the unrelativized version) so there is an N such that H(X \ n) > n — c for 
all n > N. 

Now consider the set of strings R = {u : H(u) > \u\ — c}. This set is a- 
computable and contains all initial segments of X, except (possibly) the first N. 
Now, consider PA with additional axioms K(u) > \u\ — c for all prefixes u of 
X and for all u E R. This theory (called T' in the sequel) is a-computable 
because R is a-computable and we add only finitely many axioms for prefixes 
of X. Moreover, it extends MLR C (X) and all T'-theorems are true (recall that 
H < K by construction). If MLR C (X) proved all true Ili-statements, then so 
would the stronger theory T', hence T' would compute 0'. But T' is a-computable 
so this would yield a >t 0'. This is impossible: our initial assumption is that X 
is a-random; if a >t 0', then X would be O'-random (a.k.a. 2-random), but no 
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O'-random sequence can compute 0' (see for example |Nie09( Theorem 5.3.16]), 
and X does compute 0' by our initial remark. □ 



4.2 The theories MLR^(X) and MLR"(X) 

Another possibility is to extend the language of PA by a unary functional sym- 
bol /. Then we can add one axiom saying that /(0)/(l) ... is a binary sequence 
whose prefixes of every length n have complexity at least n — c (note that one 
single formula is now enough to claim that all strings /(0)/(l) . . . f(n) have 
complexity at least n — c), and a series of axioms that specify the elements of 
the sequence: /(0) = x , /(l) = Xx, etc. To make this theory reasonable, we also 
need to add induction over formulas that contain / (just to prove that / has 
some prefix of each length). Evidently, this theory, which we denote by MLR' C (X), 
proves all the axioms of MLR C (X). 

Moreover, it proves some statements that look stronger than MLR C (X). In- 
deed, for each n we can prove in MLR' C (X) the statement Ext c (X \ n) where 
Ext c (u) says that for every n > |u| string u is a prefix of some string v of length 
n such that for every prefix w of v the inequality K(u>) > \w\ — c holds. (Let v 
be the prefix of /(0)/(l) ... of length n.) 

So, adding all statements Ext c (X \ n) to PA, we get some intermediate theory 
between MLR C (X) and MLR' C (X) which we denote MLR"(X). Note that this 
theory, unlike MLR' C (X), does not contain additional functional symbol /. It is 
natural to ask how these three theories compare to one another. Below are some 
answers: 

Theorem 19 For every X and c, the theory MLR' C (X) is a conservative ex- 
tension of MLR' C '(X): both theories prove the same arithmetical formulas (with- 
out f). 

Proof. Assume that MLR' C (X) h tp and the formula tp does not involve the sym- 
bol /. We need to prove that MLR" (AT) h tp. Only finitely many axioms f(i) = x; 
are involved in the proof of tp from MLR^X). Let N be the maximal i that ap- 
pears in these axioms. Now suppose that MLR"(A) does not prove tp. Consider 
a model £DT of MLR"(A) U {^tp}- We want to interpret / in this model and get 
a model of MLR^.(A) restricted to axioms "/(n) = x n " for n < N, in which tp is 
false, thus contradicting the assumption. Note that 2JI may be a non-standard 
model. 

Consider the string x — xq . . . xn and all its extensions y with the following 
property: all prefixes v of y satisfy the inequality K(w) > \v\ — c. If the axioms of 
MLR"(X) are true in the standard model, these y form a subtree which is infinite. 
Konig Lemma then guarantees that this subtree has an infinite branch. Moreover, 
we get a definable branch if we take on each level the leftmost vertex in the 
subtree that has arbitrarily long extensions in the subtree. Now we can formalize 
this argument in PA and observe that the same formula defines some branch (i.e., 
function /) in and all vertices on this branch satisfy the inequality K(v) > 
\v\ — c in 9JI. This allows us to extend 9JI to a model of (restricted) MLR^,(A), 
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where tp is false, thus contradicting our assumption. (Note that induction for 
formulas containing / is possible in CtJt since / is definable.) □ 



In contrast to Theorem 18 there is a theory MLR' C (X) which is consistent 



and proves all true Hi-statements. 

Theorem 20 For every c, there exists a sequence X such that MLR"(X) is 
consistent and proves all true Hi -statements. 

Proof. In this proof we use again the ideas from the proof of Theorem [7] 
Let 

V c = {X E 2" | Vn K(X \ n) > n - c} 

Note that the set of all finite prefixes of strings in T> c is a co-c.e. set. Denote 
by Z be the leftmost path of 1> c (Z can be seen as a Chaitin fi number.) 

By definition theory MLR"(if) consists of axioms Ext c (Z \ n). For each n, 
Z f n is the first (in lexicographic order) string a of length n such that Ext c (er) 
holds. Since Ext c is a 111 predicate, we can enumerate the strings r such that 
-iExt c (r). Denote by t the step in this enumeration when we get the list of all 
strings smaller than Z \ n. Given c and Z \ n we can compute this number t. 
Moreover, given Ext c (r) we can prove that any time s > t has complexity at 
least n — O(logn) (similar to the proof Theorem [7]). The rest of the proof is 
identical to Theorem [7] □ 
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if 



However, the theories MLR"(X) are still weak in the sense of Theorem 
a formula tp is provable in all theories MLR"(X) for a given X, then tp is already 
provable in PA. The proof is identical. 

This has the following interesting corollary. 

Corollary 21 Let MLR"(X) be the statement: (3c) MLR"(X) (which can be 
made in PA). Then MLR"(X) is conservative over PA. 

Proof. From the above discussion, if ip is a formula without constant / that 
is not provable in PA, then there exists a constant d such that MLR^'(X) does 
not prove tp. Since (3c) MLR"(X) is provable from MLR^X), it follows that 
(3c) MLR' C '(X) does not prove tp. □ 



4.3 Initial segment complexity of nonrandom sequences 

In Theorem[l8]we proved that if X is a Martin-L6f sequence such that the axioms 
"~K(X \ n) > n— c" are true for all n, then the theory consisting of these axioms 
does not prove all true Ili-statements. What happens if we consider sequences 
X that are not random, but for which U K(X \ n) > n — c" is still true for 
infinitely many n? In constrast to Theorem [18| there is a sequence X and a 
constant c such that all true axioms of the form "K(X f n) > n — c" prove all 
true Ili-statements. 
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Theorem 22 Fix some constant c > 0. There exists a sequence X and an infi- 
nite set iCN such that the theory consisting of axioms 

"K(X \n)>n-c" 

for all n e A is consistent and proves all true H±- statements. 

Proof. We order all strings by length and then lexicographically. That is, a < r 
if and only if \a\ < |r|, or \o~\ = |r| and a is lexicographically before r. 

We construct X as follows: let y be some string with K(y ) < \yo\ — c. 
Inductively, let x n be the first (for the above order) string x that extends y n 
with K(x) > |a;| — c, and let y n +i be some string extending x n such that 

K(y„ + i) < |y n+ i| - (n+1) - c. 

Note that x n must exist, as every string has a Martin-L6f random extension, 
and for a Martin-L6f random sequence Z 

lim (K(Z I" n) — n) = oo. 

n— ► oo 

Let X = lim x„. Consider the axioms 

n— ¥ oo 

B K(a; n ) > |x„| -c" 

for all n E N. (That is: A = {\x n \ : n S N} in the statement of the theorem.) We 
claim that this theory can prove all true Ili-statements. The proof is similar to 
the proof of Theorem [7J 

Let A{p) be any algorithm. As in Theorem [7j it is sufficient to prove that 
for every input p such that A{p) does not terminate, our theory proves this 
non-termination. 

Define t n to be the first t such that K*(x) < \x\ — c for all strings x that 
extend y n and come before x n in our order. We prove that from the length \p\ 
of a program p, we can compute a number n such that the computation A{p) 
either terminates in less than t n steps, or does not terminate at all. 

Every string p determines the number of steps needed for the termination 
of A(p). Knowing p and y n , we find this number t(p) and take the first x that 
extends y n such that K*( p )(x) > \x\ — c. If A(p) does not halt within t n steps, 
then we know that x = x n . On the other hand, for every p such that A{p) 
terminates we get some string x extending y n with K(ir) < K(p) +K(y„) + 0(1). 
By definition, y n has a low complexity. Consequently 

K(x) < \p\ + \y n \-n-c + 0{\og\p\) 
< \x\ + \p\ - n - c+ 0(log |p|) 

Given the program size \p\, we can find an n that is large enough such that 
\p\ — n + 0(log \p\) is negative. For such an 7i, we know that x is different from x n . 
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Hence, whether or not x differs from x n or not, determines whether A(jp) ter- 
minates or not. So if A(p) terminates at all, then it must do so in less than t n 
steps. 

As in the proof of Lemma [T] this reasoning can be formalized in PA. Having 
"K(a; n ) > \x n \ — c" as an axiom, we can prove that x n is the first string extending 
y n such that K(x n ) > \x n \ — c. Then, given y n , we can prove the value of t n . 
Finally, given p and taking y n for n suitably large, we can prove (doing the above 
prove inside PA) that A{p) either terminates in t n steps or does not terminate 
at all, as required. 

Remark that the sequence X that we constructed in the proof, has arbitrarily 
large complexity dips in between the initial segments x n with complexity at least 



c. Hence X is not Martin-L6f random. This is essential by Theorem 18 



Indeed, even if we choose a Martin-L6f random X and a constant c small enough 
such that K(X \ n) > n — c is not true for all n, it still must be true for all but 
finitely many n. In this case the proof of Theorem [18] still works to show that 
the theory consisting of all true axioms "K(X \ n) > n — c" does not prove all 
true Hi-statements. 

For plain complexity, the proof of Theorem 22 does not work. The reason is 
that not every string has an extension with high plain complexity. 

Question 2. Does there exist a sequence X and a theory T consisting of infinitely 
many axioms of the form 

"C(A \n) > n-c" 
such that T is consistent and proves all true Hi-statements? 

Note that if there does exists such a sequence X, then X must be 2-random. 
This makes the question quite different from Theorem 22 as the sequence con- 
structed in the proof of the theorem was necessarily non-random, whereas Ques- 
tion [2] relates to the properties of random sequences. 

Moreover, remark that, although there are no Turing-complete 2-random 
sequences, some corresponding theory T might still be Turing complete. 

A summary of this section and related results can be found in Figure [2] 
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